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Algorithms for Leader Selection in 
Stochastically Forced Consensus Networks 

Fu Lin, Makan Fardad, and Mihailo R. Jovanovic 

Abstract 

We examine the leader selection problem in stochastically forced consensus networks. A node is a 
leader if, in addition to relative information from its neighbors, it also has an access to its own state. 
This problem arises in several applications including control of vehicular formations and localization 
in sensor networks. We are interested in selecting an a priori specified number of leaders such that the 
steady-state variance of the deviation from consensus is minimized. Even though we establish convexity 
of the objective function, combinatorial nature of constraints makes determination of the global minimum 
difficult for large networks. We introduce a convex relaxation of constraints to obtain a lower bound 
on the global optimal value. We also use a simple but efficient greedy algorithm and the alternating 
direction method of multipliers to compute upper bounds. Furthermore, for networks with noise-free 
leaders that perfectly follow their desired trajectories, a sequence of convex relaxations is used to identify 
the leaders. Several examples ranging from regular lattices to random graphs are provided to illustrate 
the effectiveness of the developed algorithms. 

Index Terms 

Alternating direction method of multipliers, consensus networks, convex optimization, convex re- 
laxations, greedy algorithm, leader selection, performance bounds, semidefinite programming, sensor 
selection, sparsity, variance amplification. 

I. Introduction 

Reaching consensus in a decentralized fashion is an important problem in network science [1]. 
This problem is often encountered in social networks where a group of individuals is trying to 
agree on a certain issue [2], [3]. A related problem has been studied extensively in computer 
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science with the objective of distributing evenly computational load over a network of proces- 
sors [4], [5]. Recently, consensus problem has received considerable attention in the context 
of distributed control [6]-[9]. For example, in cooperative control of vehicular formations, it 
is desired to use local interactions between vehicles in order to reach agreement on quantities 
such as heading angle, velocity, and inter-vehicular spacing. Since vehicles have to maintain 
agreement in the presence of uncertainty it is of importance to study robustness of consensus. 
Several authors have recently used the steady-state variance of the deviation from consensus to 
characterize performance limitations of stochastically forced networks [10]— [16]. 

In this paper, we consider undirected consensus networks with two groups of nodes. Ordinary 
nodes, so-called followers, form their action using relative information exchange with their 
neighbors; special nodes, so-called leaders, also have access to their own states. This setting 
may arise in the control of vehicular formations where all vehicles are equipped with ranging 
devices (that provide information about relative distances with respect to their neighbors), and 
the leaders additionally have GPS devices (that provide information with respect to a global 
frame of reference). 

We are interested in assigning an a priori specified number of nodes as leaders in order to 
minimize the steady-state variance of the deviation from consensus. For undirected networks in 
which all nodes are subject to stochastic disturbances, we establish convexity of the objective 
function. In spite of this, combinatorial nature of Boolean constraints (a node is either a leader 
or it is not) makes determination of the global minimum challenging for large networks. Instead, 
we focus on computing lower and upper bounds on the global optimal value. Convex relaxation 
of Boolean constraints is used to obtain a lower bound, and two different algorithms are used to 
obtain an upper bound and to identify leaders. The first algorithm utilizes one-leader- at-a-time 
(greedy) approach followed by a swap procedure that improves performance by checking possible 
swaps between leaders and followers. In both steps, algorithmic complexity is significantly 
reduced by exploiting structure of low-rank modifications to Laplacian matrices. The second 
algorithm utilizes the alternating direction method of multipliers (ADMM) [17] which is capable 
of handling the nonconvex Boolean constraints by a simple projection. Computational efficiency 
of these algorithms makes them well-suited for establishing achievable performance bounds for 
leader selection problem in large stochastically forced networks. 

Following [18]— [21], we also examine consensus networks in which leaders follow desired 
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trajectories at all times. For this idealized case, the identification of noise-free leaders is addition- 
ally complicated by nonconvexity of the objective function. For consensus networks with at least 
one leader, adding leaders always improves performance [18]. In view of this, a greedy algorithm 
that selects one leader at a time by assigning the node that leads to the largest performance im- 
provement as a leader was proposed in [18]. Furthermore, it was proved in [20] that the variance 
of deviation from consensus is a supermodular function of the set of noise-free leaders. Thus, the 
supermodular optimization framework in conjunction with the greedy algorithm can be used to 
provide selection of leaders that is within a provable bound from globally optimal solution [20]. 

In contrast to the above references, we use convex optimization to select noise-free leaders. 
An alternative explicit expression for the objective function that we provide is used to identify 
the source of nonconvexity and to suggest an LMI-based convex relaxation. In addition to this, 
we relax the hard Boolean-valued constraint on the number of leaders with a soft one. This is 
achieved by augmenting the objective function with the sparsity-promoting term that penalizes 
the l\ norm of the vector of optimization variables [22], [23]. The t\ norm provides a means for 
obtaining a sparse solution whose nonzero elements identify the leaders. The developed algorithm 
produces a tradeoff curve between the number of noise-free leaders and the variance of the 
deviation from consensus by solving a parameterized family of convex optimization problems. 

The controllability of leader-follower consensus networks is also an active area of research [24]- 
[27]. Recent efforts have focused on characterizing graph-theoretic conditions for controllability 
of the network in which a number of pre- specified leaders act as control inputs. In contrast, 
the leader selection problem aims at identifying leaders that are most effective in maintaining 
consensus in the presence of disturbances. Other related work on augmenting topologies of 
networks to improve algebraic connectivity includes [28]-[30]. 

The paper is organized as follows. In Section we formulate the leader selection problem 



and establish connections with the sensor selection problem. In Section III we propose an LMI- 
based convex relaxation of the objective function in the noise-free leader selection problem. 
Furthermore, instead of imposing Boolean constraints, we augment the objective function with the 



l\ norm of the optimization variable. In Section IV we develop efficient algorithms to compute 
lower and upper bounds on the global optimal value for the noise-corrupted leader selection 
problem. Finally, we conclude the paper with a summary of our contributions in Section IVJ 
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II. Problem formulation 

In this section, we formulate the noise-corrupted and noise-free leader selection problems 
in consensus networks and make connections to sensor selection problem in sensor networks. 
Furthermore, we establish an equivalence between the noise-corrupted and noise-free leader 
selection problems when all leaders use arbitrarily large feedback gains on their own states. 

A. Leader selection problem in consensus networks 
We consider n single-integrators 

Ipi = Ui + Wi, i = 1, ... ,n 

where ipi is the scalar state, Ui is the control input, and Wi is the white stochastic disturbance with 
zero-mean and unit- variance. A node is a follower if it uses only relative information exchange 
with its neighbors to form its control action, 

A node is a leader if, in addition to relative information exchange with its neighbors, it also has 
access to its own state 

Ui = -^2(ipi- ipj) ~ Kiipi. 

Here, Ki is a positive number and Hi is the set of all nodes that node % communicates with. 

The communication network is modeled by a connected, undirected graph; thus, the graph 
Laplacian L is a symmetric positive semidefinite matrix with a single eigenvalue at zero and 
the corresponding eigenvector 1 of all ones. A state- space representation of the leader-follower 
consensus network is given by 

ip — —(L + D K D X ) ip + w (1) 

where 

D K := diag (k) , D x := diag (x) 

are diagonal matrices formed from the vectors k—[ki ■ ■ ■ n n } T and x — [xi ■ ■ ■ x n ] T . Here, 
x is a Boolean- valued vector with its ith entry Xj G {0, 1}, indicating that node i is a leader if 
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Xi = 1 and that node i is a follower if x , = 0. In connected networks with at least one leader 
L + D K D X is a positive definite matrix and the steady-state covariance of tp is determined by 



E := lim£ U(t) ip T (t)) 

e -(L+D K D :c )t e -(L+D K D x ) T t ^ 





= ^(L + AA)- 1 . 
Following [12], [15], we use the total steady- state variance 

trace (S) = - trace ((L + D K D x )~ l ) (2) 

to quantify performance of consensus networks subject to stochastic disturbances. 

We are interested in identifying Ni leaders that are most effective in reducing the steady-state 
variance ([2]). For an a priori specified number of leaders jVj < n, the leader selection problem 
can thus be formulated as 

minimize J{x) = trace ((L + 

X 

subject to Xi G {0,1}, i = l,...,n (LSI) 

t T X = Nr. 



In (LSI), the number of leaders Ni as well as the matrices L and D K are the problem data, and 



the vector x is the optimization variable. As shown in Section IV for a positive definite matrix 



L + D K D X , the objective function J in (LSI) is a convex function of x. The challenging aspect 



of (LSI) comes from the nonconvex Boolean constraints Xi G {0,1}; in general, finding the 



solution to (LSI) requires an intractable combinatorial search. 



Since the leaders are subject to stochastic disturbances, we refer to |(LSl) as the noise-corrupted 



leader selection problem. We also consider the selection of noise-free leaders which follow their 
desired trajectories at all times [18]. Equivalently, in coordinates that determine deviation from 
the desired trajectory, the state of every leader is identically equal to zero, thereby yielding only 
the dynamics of the followers 

V7 = ~ L f^f + w f- 

Here, Lj is obtained from L by eliminating all rows and columns associated with the leaders. 
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Thus, the problem of selecting leaders that minimize the steady-state variance of ipj amounts to 

minimize J fix) 

X 

subject to %i E {0, 1}, 



trace (Lj x ) 



x { E 
t T x = 



n 



(LS2) 



As in (LSI) the Boolean constraints X{ E {0, 1} are nonconvex. Furthermore, as we demonstrate 



in Section III the objective function Jf in (LS2) is a nonconvex function of x. 

In what follows, we establish equivalence between the noise-corrupted and noise-free leader 



selection problems (LSI) and (LS2) when all leaders use arbitrarily large feedback gains on 
their own states. Partitioning tp into the state of the leader nodes ipi and the state of the follower 
nodes if>f brings system ([!]) to the following forrrQ 



Li + D 

L 



L 














+ 




Lf 








Wf 



(3) 



Here, D Kl := diag (ki) with K\ E E ! being the vector of feedback gains associated with the 
leaders. Taking the trace of the inverse of the 2x2 block matrix in ([3]) yields 

J = trace (L7 1 + L^L^S^L.L] 1 + S' 1 ) 

where 



S, 



Kl 



Li + D 



LqLj 1 Lq 



is the Schur complement of Lf. Since S~ vanishes as each component of the vector Ki goes 
to infinity, the variance of the network is solely determined by the variance of the followers, 
Jf = trace (Lj 1 ) , where Lf is the reduced Laplacian matrix obtained by removing all columns 
and rows corresponding to the leaders from L. 



'Note that D x does not show in (|3| since the partition is performed with respect to the indices of the and 1 diagonal 
elements of D x . 
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B. Connections to the sensor selection problem 

The problem of estimating a vector ip e W l from m relative measurements corrupted by 
additive white noise 

Vij = Ipi ~ i>j + Wij 

arises in distributed localization in sensor networks. We consider the simplest scenario in which 
all ipi's are scalar-valued, with ipi denoting the position of sensor i; see [10], [11] for vector- 
valued localization problems. Let X r denote the index set of the m pairs of distinct nodes between 
which the relative measurements are taken and let belong to IR n with 1 and —1 at its ith and 
jth elements, respectively, and zero everywhere else. Then, 

or, equivalently in the matrix form, 

y r = Ejij + w r (4) 

where y r is the vector of relative measurements and E r e ]R nxm is the matrix whose columns 
are determined by for e X r . Since ip + at for any scalar a results in the same y r , with 
relative measurements the position vector ip can be determined only up to an additive constant. 
This can also be verified by noting that E^t = 0. 

Suppose that N t sensors can be equipped with GPS devices that allow them to measure their 
absolute positions 

y a = E T a i, + E T a w a 

where E a e M. nxNl is the matrix whose columns are determined by e i5 the ith unit vector in 
W n , for % E l a , the index set of absolute measurements. Then the vector of all measurements is 
given by 



Vr 




' El' 


ip + 


I 




w r 










Va 






El _ 




W a 



(5) 



where w r and w a are zero-mean white stochastic disturbances with 

S(w r wJ) = W r , S(w a wl) = W a , S(w r wl) = 0. 
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In Appendix |A} we show that the problem of choosing N t absolute position measurements 
among n sensors to minimize the variance of the estimation error is equivalent to the noise- 



corrupted leader selection problem (LSI) Furthermore, when the positions of iVj sensors are 
known a priori we show that the problem of assigning Ni sensors to minimize the variance of 
the estimation error amounts to solving the noise-free leader selection problem (LS2)| 



III. Linear approximation and soft constraint method: noise-free leaders 
In this section, we provide an alternative expression for the objective function Jf in the 



noise-free leader selection problem (LS2) We use this explicit expression to identify the source 



of nonconvexity and to suggest an LMI-based convex approximation. We then relax the hard 



constraint of having exactly N t leaders in (LS2) by augmenting the objective function Jf with 



the t\ norm of the optimization variable x. This soft constraint approach yields a parameterized 
family of optimization problems whose solution provides a tradeoff between the t\ norm of x 
and the convex approximation of the variance amplification of the network. 

A. Explicit expression for the objective function 



Since the objective function Jf in (LS2) is not expressed explicitly in terms of the optimization 
variable x, it is difficult to examine its basic properties (including convexity). We next provide 
an alternative expression for Jf that allows us to establish lack of convexity and to suggest an 
LMI-based convex approximation of Jf. 

Proposition 1: For networks with at least one leader, the objective function Jf in the noise-free 



leader selection problem (LS2) can be written as 

J f = trace (Lj 1 ) = trace ((I - D X )(G + D x o L) _1 (J - D x )) (6) 
where o denotes the elementwise multiplication of matrices, and 

G = (I — D X )L(I — D x ), A, = diag (x) , x t G {0,1}, i = l,...,n. 

Furthermore, Jf is a nonconvex function of x over the smallest convex set Xi G [0, 1] that 
contains feasible points x,- L G {0, 1} for i = 1, . . . ,n. 
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Proof: After an appropriate relabeling of the nodes (as done in ([3])), L and D x can be 
partitioned conformably into 2x2 block matrices, 



p := n — Ni 



which leads to 



Li 


L 


, D x = 






T T 


Lf 






Opxp 



G 



OpxTV, 



, D x oL 



InixNi ° £j O^xp 



J pxp 



G + D x oL 



In 1 xn 1 ° ^ O^xp 

Since I NlxNl oLi is a diagonal matrix with positive diagonal elements and since the principal sub- 
matrix Lf of the Laplacian L is positive definite for connected graphs [1, Lemma 10.36], we have 



G + D x o L y 0. 



(7) 



Consequently, 



trace ((I - J D X )(G + A, o L) -1 ^ - £>*)) = trace (L^ 1 ) 



which yields the desired result ([6]). 

We next use a counterexample to illustrate the lack of convexity of Jf over Xj G [0,1]. Let 





1 


-1 




X\ 


L = 






, D x = 




-1 


1 




x 2 



with x\ G [0, 1] and x 2 = 1. From 

G + LoD x = 



1 — Xi) 2 + Xi 
1 



>~ and J 



1 - Xi) 2 + Xi 



it can be verified that, for x\ G [0, 1/3], the second derivative of J/ with respect to x\ is negative 
(implying that Jf is not convex). ■ 
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Explicit expression ([6]) in conjunction with Schur complement can be used to convert the 
minimization of Jf into the following problem 



minimize trace (X) 

X,x 



subject to 



X 



I — D x G + D x o L 



>- 



(8) 



where X e R nxn ' 
0, we have 

X 



is a symmetric positive definite matrix. To see this, note that since G+D x oL y 



I-D n 



I-D x G + D x oL 



h X >z (I - D X )(G + D x o L)-\I - D x ). 



Thus, to minimize trace (X) subject to the inequality constraint, we take X = (I — D X )(G + 
D x o L)~ l (I — D x ), which shows equivalence between the objective functions in ^ and in 



Thus, the noise-free leader selection problem (LS2) can be formulated as 



minimize trace (X) 

X,x 



subject to 



y o 



(9) 



X I — D x 

I — D x G + D x o L 
G = (I - D X )L(I - D x ) 

D x = diag(x), l T x = N h Xi e {0,1}, i = 1, 

In addition to the Boolean constraints, the quadratic dependence of G on D x provides another 
source of nonconvexity in ([9]). Thus, in contrast to (LSI), relaxation of the Boolean constraints to 
Xi G [0, 1] for i — 1, . . . , n is not enough to guarantee convexity of the optimization problem 



, n. 



B. Linear approximation and soft constraint method 

As established in Section |TH-A[ the alternative formulation ([9]) of the noise-free leader selection 
problem (LS2)| identifies two sources of nonconvexity: the quadratic matrix inequality and the 
Boolean constraints. In view of this, we use linearization of the matrix G to approximate the 
quadratic matrix inequality in ([9]) with an LMI. Furthermore, instead of imposing Boolean 
constraints, we augment the objective function with the l\ norm of x. This choice is used 
as a proxy for obtaining a sparse solution x whose nonzero elements identify the leaders. 
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The idea of using linearization comes from [31], where a linear approximation of the objective 
function trace (YZ) at the point (Y , Z ) was considered 

(1/2) trace (Y Z + YZ ). 

To design fixed-order output feedback controllers, the authors of [31] minimize trace (Y Z+YZ ) 
with respect to Y and Z, set Y <— Y , Z ^— Z, and repeat. Motivated by this iterative scheme, 
we consider the following linear approximation of G 

G := (1/2) (I - D x ) L(I - D X0 ) + (1/2) (/ - D Xo ) L (I - D x ) (10) 

where D Xo is our current-best-estimate of D x . Replacing G with G leads to an LMI approxi- 
mation of the quadratic matrix inequality in Q. 

In addition to the linearization, we relax the hard constraint t T x = Ni for Boolean- valued x 
with a soft one. This is achieved by augmenting the objective function with the l\ norm of x, 



trace (X) + 7 



n 
1 = 1 



where the positive number 7 characterizes our emphasis on the sparsity of the vector x. We note 
that the t\ norm \\x\\e 1 is a widely used proxy for promoting sparsity [32, Chapter 6]. Putting 
this soft constraint approach and linearization ( fl0| ) together, we obtain a convex optimization 
problem 



minimize trace (X) + 7 \^ 

i = l 



n 



subject to 



X I — D x 

I - D x G + D x o L 



y 



(ii) 



G = (1/2) (/ - D x ) L(I - D X0 ) + (1/2) (/ - D xo ) L (/ - D x ) 
D x = diag(a;) 

which can be solved efficiently for small size problems (e.g., n < 30) using standard software 
such as CVX [33]. For large problems, we develop a customized algorithm in Appendix |Bj 



For a fixed value of 7, we start with D XQ = and solve problem (11) as part of an iterative 



loop; the solution D x = diag (x) at every iteration is treated as the current-best-estimate D 
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diag (xq) for the linearization in the next iteration until \\x — x \\ 2 < e. Ranging 7 from small 



to large values, the solution to the 7-parameterized family of problems (11) provides a tradeoff 
between minimization of trace (X) and minimization of Wx]]^. Larger values of 7 promote 
smaller H^H^ and typically lead to fewer nonzero elements in x. Depending on the structure of 
the network, there may not exist values of 7 that lead to a vector x with exactly Ni nonzero 
elements. In this case, we find the solution x* that has the least number of nonzero elements 
iV* with iV* > Ni, and use the indices of the iVj largest entries of x* to determine the leaders. 

C. Examples 



1) An example from [18]: We next use the soft constraint method of Section III-B to select 
leaders for a small network with 25 nodes shown in Fig. [T] As shown in Figs. 2a and 2b the 
number of leaders Ni decreases and the variance Jf of the followers increases with 7. The 
tradeoff between the number of leaders and the variance of followers is illustrated in Fig. [2cJ 

Figure [3] compares performance of the soft constraint method to performance of the greedy 
algorithm [18]-[20], which chooses one leader at a time by assigning the node that provides the 
largest performance improvement as a leader. Using a supermodular optimization framework, it 
was shown in [20] that the greedy algorithm selects noise-free leaders that are within a provable 



performance bound from the global solution to (LS2) This motivates use of greedy algorithm 



as a benchmark for performance of the soft constraint method. As shown in Fig. |3a} for small 
number of leaders (e.g., iVj < 5), the greedy algorithm outperforms the soft constraint method; 
the only exception happens for Ni = 3. A more detailed comparison is reported in Table |I| with 



the global solution to (LS2) for Ni < 5 resulting from exhaustive search. 



When the number of leaders is large (e.g., iVj > 9), the soft constraint method outperforms the 



greedy algorithm; see Fig. 3b The heuristics of assigning nodes with large degrees (i.e., large 
number of neighbors) as leaders is outperformed by both greedy and soft constraint methods. The 
poor performance of the simple degree-heuristics-based-selection was also noted in [18]-[20]. 

2) A random network example: We next consider the selection of noise-free leaders in a 
network with 100 randomly distributed nodes in a unit square. A pair of nodes can communicate 
with each other if their distance is not greater than 0.2. This scenario arises in sensor networks 



with prescribed omnidirectional (i.e., disk shape) sensing range [1]. As shown in Figs. 4a and 4b 
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Fig. 1: A small network with 25 nodes [18]. 
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(b) Variance of the network Jf 
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2*5 



(c) Tradeoff between Ni and Jf 



Fig. 2: Performance of the soft constraint method for the network shown in Fig. [TJ (a) the number 
of leaders Ni decreases with 7; (b) the variance of the followers Jf increases with 7; and (c) 
the tradeoff between Ni and Jf. 
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Fig. 3: (a) The variance of the followers Jf obtained using the soft constraint method (o), 
the greedy algorithm (*), and the degree heuristics (+) for the network shown in Fig. [T] (b) 
Comparison of three algorithms for iVj > 9. 
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TABLE I: Performance comparison of greedy algorithm and soft constraint method with the 
global solution to the noise-free leader selection problem (LS2) for the network shown in Fig. [TJ 





global solution 


greedy algorithm 


soft constraint 




Jf 


leaders 


Jf 


leaders 


Jf 


leaders 


1 


66.0 


13 


66.0 


13 


112.0 


25 


2 


38.4 


8,25 


44.8 


13,25 


64.0 


16,25 


3 


30.0 


8,16,25 


33.3 


7, 13,25 


32.1 


7, 16,25 


4 


25.3 


7,9,16,25 


27.4 


7,13,16,25 


29.4 


7, 16,20,25 


5 


20.7 


3,7,9,16,25 


22.2 


3,7, 13,16,25 


22.6 


3,7, 16,20,25 



the number of leaders iVj decreases and the variance Jf of followers increases with 7; also see 



the tradeoff curve between iVj and Jf in Fig. 4c 



For this random network example, we observe similar selection of leaders and similar per- 
formance of the soft constraint and greedy algorithms. Furthermore, for Ni > 1, both these 
algorithms significantly outperform the degree-heuristics-based-selection; see Fig. [5] To gain 
some insight into the selection of leaders, we compare the results obtained using soft constraint 



method and the degree heuristics. As shown in Fig. 6b the degree heuristics chooses nodes that 
turn out to be in the proximity of each other. In contrast, the soft constraint method select leaders 



that, in addition to having large degrees, are far from each other; see Fig. 6a As a result, the 
selected leaders can influence more followers and thus more effectively improve the performance 
of the network. 

The contrast between degree heuristics and soft constraint method becomes even more dramatic 



for large number of leaders. As shown in Figs. [6c] and [6dJ the leader sets obtained using the 
soft constraint method and degree heuristics are almost complements of each other. While the 
degree heuristics clusters the leaders around the center of the network, the soft constraint method 
distributes the leaders around the boundary of the network. 



Figures 7a and 7b show the degree distribution of all the nodes in the random network and of 



the 41 nodes that are selected as leaders (see Fig. 6c). In contrast to the degree heuristics, the 
soft constraint method chooses nodes with both large- and small-degrees as leaders; in particular, 
all nodes with degree less than 8 and all nodes with degree greater than 18 are selected. 
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Fig. 4: A random network with 100 nodes: (a) the number of leaders Ni decreases with 7; (b) 
the variance of the followers Jf increases with 7; and (c) the tradeoff curve between Ni and Jf. 




Fig. 5: The objective function Jf obtained using the soft constraint method (o), the greedy 
algorithm (*), and the degree heuristics (+) for the random network. 



IV. Lower and upper bounds on global performance: noise-corrupted leaders 



In contrast to the noise-free leader selection problem (LS2), we next show that the objective 



function in the noise-corrupted leader selection problem (LSI) is convex. We take advantage 
of the convexity of J in |(LS1) and develop efficient algorithms to compute lower and upper 



bounds on the global optimal value J opt of (LSI) A lower bound results from convex relaxation 



of Boolean constraints in (LSI) Furthermore, upper bounds are obtained using an efficient greedy 
algorithm and the alternating direction method of multipliers (ADMM). Greedy algorithm selects 
one leader at a time, which introduces low-rank modifications to the Laplacian matrix. We exploit 
this feature in conjunction with the matrix inversion lemma to gain computational efficiency. On 
the other hand, the ADMM algorithm handles the Boolean constraints explicitly by a simple 
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(c) Ni = 41 (d) Ni = 40 

Fig. 6: Selection of leaders (•) for the random network example using soft constraint method in 
(a) and (c) and using degree heuristics in (b) and (d). 



projection onto a discrete nonconvex set. Finally, we provide two examples to illustrate the 
performance of the developed approach. 

A. Convex relaxation to obtain a lower bound 



Since the objective function J in (LSI) is the composition of a convex function trace {L^ 1 ) 
of a positive definite matrix L y with an affine function L = L + D K D X , it follows that J is 
a convex function of x. By enlarging the Boolean constraint set x^ e {0, 1} to its convex hull 
xi E [0, 1] (i.e., the smallest convex set that contains the Boolean constraint set), we obtain a 
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Fig. 7: The degree distribution of (a) the random network of Section III-C.2 and of (b) 41 leaders 
selected using soft constraint method. Note that the soft constraint method chooses all nodes 
with degree less than 8 and all nodes with degree greater than 18. 



convex relaxation of (LSI) 



minimize J(x) = trace ((L + D K D X ) x ) 

X 

subject to t T x = Ni, < a;, < 1, i — 1, 



(CR) 



, n. 



Since we have enlarged the constraint set, the solution x* of the relaxed problem (CR) provides 
a lower bound on J opt . However, x* may not provide a selection of Ni leaders, as it may turn 



out not to be Boolean- valued. If x* is Boolean-valued, then it is the global solution of (LSI) 



Following similar argument as in Section III-A, Schur complement can be used to formulate 
the convex optimization problem |(CR)| as an SDP 

minimize trace (X) 

X,x 



subject to 



y o 



x I 

I L + D K D X 

\ T X = Nr, < X 4 < 1, 



n. 



For small networks (e.g., n < 30), this problem can be solved efficiently using standard SDP 
solvers. For large networks, we develop a customized interior point method in Appendix |Cj 
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B. Greedy algorithm to obtain an upper bound 



With the lower bound on the optimal value J opt resulting from the convex relaxation (CR) 



in Section IV-A we next use a greedy algorithm to compute an upper bound on J opt . This 
algorithm selects one leader at a time by assigning the node that provides the largest performance 
improvement as the leader. Once this is done, an attempt to improve a selection of Ni leaders 
is made by checking possible swaps between the leaders and the followers. Similar greedy 
algorithms have been used in [18], [19] for noise-free leader selection problem. In the noise- 
corrupted problem, we show that substantial improvement in algorithmic complexity can be 
achieved by exploiting structure of the low-rank modifications to the Laplacian matrix. 

1) One-leader-at-a-time algorithm: As the name suggests, we select one leader at a time by 
assigning the node that results in the largest performance improvement as the leader. To select 
the first leader, we compute 

J\ = trace ((L + fi^ef 

for i — 1, . . . , n, and assign the node, say V\, that achieves the minimum value of { J\}. If two 
or more nodes provide the largest performance improvement, we select one of these nodes as a 
leader. After choosing s leaders, v i, . . . , v s , we compute 

J l s+ x = trace ((L s + K^ef 
L s = L + k Vi e Vl + • • • + fi^e^e^ 

for % {vi, . . . , v s }, and select node v s+ i that yields the minimum value of {J s ' +1 }. This 
procedure is repeated until all TVj leaders are selected. 

Without exploiting structure, the above procedure requires 0(n 4 Ni) operations. On the other 
hand, the rank-1 update formula obtained from matrix inversion lemma 

1 -f K,id i 1j s t-i 



yields 



f s+1 = traced) " ; ^ U - ] ^ 



1 + Hi(L s 

where (Lj^i is the ith column of L~ l and (L' 1 )^ is the iith entry of Lj 1 . To initiate the 
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algorithm, we use the generalized rank-1 update [34], 

L^ 1 = L) - (L t e i )l T - l(L t e i ) T + ((l/«<) + ejL^e^lV 

and thus, 

J\ = trace (L f ) + n((l/^) + ejL^et) 
where iJ denotes the pseudo-inverse of L (e.g., see [35]) 

L f = (L + 11 T /^) _1 - H r /n. 



Therefore, once L" 1 is determined, the inverse of the matrix on the left-hand- side of (12) can be 
computed using 0(n 2 ) operations and J] +1 can be evaluated using 0(n) operations. Overall, Ni 
rank-1 updates, nNi/2 objective function evaluations, and one full matrix inverse (for computing 
L^ 1 ) require 0(n 2 Ni +n 3 ) operations as opposed to 0{n A Ni) operations without exploiting the 
low-rank structure. In large-scale networks, further computational advantage can be gained by 
exploiting structure of the underlying Laplacian matrices; see [36]. 

2) Swap algorithm: Having determined a selection of leaders using one-leader- at-a-time 
algorithm, we swap one of the iVj leaders with one of the n — Ni followers, and check if 
such a swap leads to a decrease in J. If no decrease occurs for all (n — Ni)Ni swaps, the 
algorithm terminates; if a decrease in J occurs, we update the leader and then restart checking 
the possible (n — Ni)Ni swaps for the new leader selection. This swap procedure has been used 
as an effective means for improving performance of combinatorial algorithms encountered in 
graph partitioning [37], sensor selection [38], and community detection problems [39]. 

Since a swap between a leader i and a follower j leads to a rank-2 modification ( fl"3| ) to the 
matrix L = L + D K D X , we can exploit this low-rank structure to gain computational efficiency. 
Using the matrix inversion lemma, we have 

(L - K iei eJ + K.ejejy 1 = L~ l - L' 1 Eq (h + JSp-%)- 1 JSg L" 1 (13) 

where Eij = [e« ej], = [ — KiCi KjCj], and I2 is the 2x2 identity matrix. Thus, the 
objective function after the swap between leader % and follower j is given by 

Jij = J - trace ((/ 2 + 1^1. ■ E,, ) 1 L^E^ . (14) 
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6 
9 

Fig. 8: A 3 x 3 grid. 



Here, we do not need to form the full matrix L 2 , since 



E Ij L 2E ij 



-K t (L- 2 ] 



.yi 



Kj(L- 



and the r/'th entry of L 2 can be computed by multiplying the 2th row of L 1 with the jth 
column of L^ 1 . Thus, evaluation of Jy takes 0(n) operations and computation of the matrix 



inverse in (13) requires 0{n 2 ) operations. 

Remark 1: Since the total number of swaps for large-scale networks can be large, we fol- 
low [38] and limit the maximum number of swaps with a linear function of the number of nodes 
n. On the other hand, particular structure of networks can be exploited to reduce the required 
number of swaps. To illustrate this, let us consider the problem of selecting one leader in a 
network with 9 nodes shown in Fig. [8j Suppose that nodes in set Si := {1,3,7,9} have the 
same feedback gain Ki and that nodes in set 5*2 := {2, 4, 6, 8} have the same feedback gain k 2 . 
In addition, suppose that node 5 is chosen as a leader. Owing to symmetry, to check if selecting 
other nodes as a leader can improve performance we only need to swap node 5 with one node 
in each set Si and S 2 . We note that more sophisticated symmetry exploitation techniques have 
been discussed in [26], [40]. 



C. Alternating direction method of multipliers 

Since the previously introduced greedy algorithm may not yield an optimal selection of leaders, 
we next employ the ADMM algorithm [17] as an alternative approach to a selection of N t leaders 



for problem (LSI) Although the convergence of this method depends on the initial conditions and 
on the algorithmic parameters, ADMM is capable of handling the nonconvex Boolean constraints 
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explicitly by a simple projection onto a discrete nonconvex set 



C := {x\ l T x = Ni, Xi G {0,1}, % = l,...,n}. 



(15) 



We can rewrite (LSI) as an unconstrained optimization problem 



minimize J{x) + I(x) 



X,x 



(16) 



where X(x) is indicator function associated with set C 



l(x) 



if x e C 
- oo if x C. 



Now, (16) can be put into the following equivalent form suitable for the application of ADMM 



minimize J(x) + X{z) 

X, z 

subject to x — z = 



(17) 



and the augmented Lagrangian associated with (17) is given by 



Cp(x,z,X) = J{x) + I(z) + X T (x — z) + — \\x — z\\l 

where A G IR n is the dual variable and p is a positive number. For k = 0, 1, . . ., the ADMM 
algorithm updates x, z, and A in an iterative fashion 



x 



k+l 



argmin C p (x, z k , 



y k+l 



argmin C p (x k+1 , z, X k ) 



X k+i ._ X k + p ^ x 



k+l 



z k+l ) 



(18a) 
(18b) 
(18c) 



until \\x k+l - z k+l \\ 2 < e and \\z k+1 - z k \\ 2 < e. 

Splitting the optimization variables into two copies {x, z} and updating them in an alternating 
fashion yields the minimization problems (18a) and ( |18b ) that are easy to solve. 



1) x-minimization step: By completion of squares in C p with respect to x, problem (18a) can 
be expressed as 

minimize trace ((L + D^D^ 1 ) + - \\x - u k \\\ (19) 
x 2 
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where 



u 



:vp)A fc . 



Since (fT9]) is equivalent to the following problem, 



minimize trace ((L + D K D X ) l ) + p 
subject to ^ || x — n fe ||2 < /i 



it can be expressed as an SDP 



minimize trace (X) + p 



subject to 



X 



I L + D K D X 



y o 



/ x — u 
(x - u k ) T 2/i/p 



y o 



where the second LMI, resulting from the use of Schur complement, is an alternative way of 
writing the quadratic constraint 

- (x- u k ) T (x - u k ) > 0. 



Thus, for small networks, problem (19) can be solved efficiently using standard SDP solvers. 
For large networks, we use descent methods [32] (e.g., Newton's method) with the gradient and 
Hessian of C p with respect to x being given by 



V 2 £, 



- « o diag ((L + D K D X )~ 2 ) + p {x - u h 



.„ - 2 (D K (L + D K D X )- 2 D K ) o (L + D K D x )~ l + pi 

where diag (M) denotes the vector determined by the main diagonal of a matrix M. 

2) z -minimization step: Using similar argument as in [17, Section 9.1] (see Appendix [D] for 



details), the ^-minimization problem (18b) can be solved explicitly using a simple projection 
onto the set C 



1 if v\ > [v k ] Nl 
if v k < [v k ] Nl 



(20) 
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TABLE II: Lower and upper bounds on the noise-corrupted leader selection problem (LSl)| for the 
example shown in Fig. [TJ Lower bounds are obtained by solving the convex relaxation |(CR)^ 
upper bounds J u b from greedy algorithm - the one-leader-at-a-time algorithm followed by the 
swap algorithm - are actually tight, i.e., J ub = J op t; upper bounds J uh from ADMM are tight 

for Ni = 4, 5. 





greedy algorithm 


ADMM 


Ni 


Jib 


Jub 


leaders 


Jub 


leaders 


1 


38.4 


72.3 


13 


118.3 


25 


2 


30.3 


43.4 


8,25 


47.9 


7,25 


3 


26.7 


35.2 


8,16,25 


36.7 


7, 16,25 


4 


24.3 


30.0 


3,7, 16,25 


30.0 


3,7, 16,25 


5 


22.4 


25.8 


3,7,9,16,25 


25.8 


3,7,9,16,25 



where 

v k : = x k +! + (l/p)\ k 

and [v fc ]jv ; denotes the (A^)th largest entry of v k . We note that reference [17] provides related 
projections onto several important nonconvex sets. 

D. Examples 

We next provide several examples to illustrate the performance of the developed methods. In 
all examples we set Ki to be the degree of node i. We set the initial conditions of the ADMM 
algorithm to {z° = 0, A = 0} and the penalty weight to p = 10 3 . 



1) A small network from [18]: For the example discussed in Section III-C.l with N t < 
5, we determine the global minima to the noise-corrupted leader selection problem (LSI) by 
exhaustive search. It turns out that the one-leader-at-a-time algorithm followed by the swap 
algorithm actually finds the global minima. As shown in Table |II} ADMM provides the global 
minima for the problems with 4 and 5 leaders. It is also worth mentioning that the globally 
optimal selection of noise-corrupted leaders coincides with the globally optimal selection of 
noise-free leaders (cf. Table [I]). 



Figure 9a shows lower bounds resulting from convex relaxation and upper bounds resulting 



from ADMM and from greedy algorithm. As the number of leaders iVj increases, the gap between 



the lower and upper bounds from greedy algorithm decreases; see Fig. 9b 



February 5, 2013 



Submitted to IEEE Trans. Automat. Control 



24 




(a) (b) 

Fig. 9: The network with 25 nodes: (a) lower bounds (— ) resulting from convex relaxation and 
upper bounds resulting from greedy algorithm (i.e., one-leader-at-a-time algorithm followed by 
swap algorithm) (+) and from ADMM (o); (b) the gap between lower bounds and upper bounds 
resulting from greedy algorithm. 



2) A 2D lattice: We next consider the leader selection problem for a 9 x 9 regular lattice. 



Figure 10a shows lower bounds resulting from convex relaxation and upper bounds resulting 
from ADMM and from greedy algorithm, i.e., the one-leader-at-a-time algorithm followed by 
the swap algorithm. As the number of leaders N[ increases, the gap between the lower and upper 



bounds from greedy algorithm decreases; see Fig. 10b For iVj = 1, . . . , 40, the number of swap 
updates ranges between 1 and 19 and the average number of swaps is 10. 



Figure 1 1 shows selection of leaders resulting from the greedy algorithm for different choices 
of Ni. For Ni = 1, the center node (5, 5) provides the optimal selection of a single leader. As Ni 
increases, nodes away from the center node (5,5) are selected; for example, for Ni = 2, nodes 
{(3,3), (7,7)} are selected and for Ni = 3, nodes {(2,6), (6,2), (8,8)} are selected. Selection 
of nodes farther away from the center becomes more significant for Ni = 4 and jVj = 8. 



The selection of leaders exhibits symmetry shown in Fig. 1 1 In particular, when N t is large, 



almost uniform spacing between the leaders is observed; see Fig. 1 If for Ni = 40. This is in 
contrast to the selection of leaders along the boundary nodes in the random network example in 



Fig. 6c For the random network example in Section III-C.2 , the selection of the noise-corrupted 
leaders resembles that of the noise-free leaders (results are omitted for brevity). 
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5 10 15 20 25 30 35 40 10 20 30 40 



N l N l 
(a) (b) 

Fig. 10: A 2D lattice: (a) lower bounds (— ) resulting from convex relaxation and upper 
bounds resulting from greedy algorithm (i.e., one-leader-at-a-time algorithm followed by swap 
algorithm) (+) and from ADMM (o); (b) the gap between lower bounds and upper bounds 
resulting from greedy algorithm. 



3 4 5 6 7 
(a) iV ; = 1 



8 9 



2 3 4 5 6 7 8 9 
(d) iV, = 4 



1 2 3 4 5 6 7 8 9 
(b) Nl = 2 



a 



1 2 3 4 5 6 7 8 
(e) N t = I 



id 



1 2 3 4 5 6 7 8 9 
(C) iVi = 3 

J 1 T 1 T i T U 



31 <) 

4» — 



rfiliii 8 

(fj Ni = 40 



Fig. 11: Selections of leaders (•) obtained using the one-at-a-time algorithm followed by the 
swap algorithm for a 2D lattice. The two selections of two leaders denoted by (•) and (*) in (b) 
provide the same objective function J. The four selections of three leaders denoted by (•), (*), 
(x), and (o) in (c) provide the same J. 



February 5, 2013 



Submitted to IEEE Trans. Automat. Control 



26 



V. Concluding remarks 

The main contribution of this paper is the development of efficient algorithms that facilitate 
selection of leaders in large stochastically forced consensus networks. For the noise-corrupted 
leader selection problem |(LS1)[ we focus on computing lower and upper bounds on the global 
optimal value. A lower bound is obtained by solving a convex relaxation, and upper bounds result 
from a simple but efficient greedy algorithm and the alternating direction method of multipliers. 



For the noise-free leader selection problem (LS2), we provide an explicit expression for the 



variance amplification of the network. This allows us to identify sources of nonconvexity and to 



propose a convex relaxation of the objective function in (LS2) Furthermore, we use augmentation 
of the objective function with the l\ norm of the vector of optimization variables as a surrogate 
for obtaining a sparse solution whose nonzero elements identify the leaders. Several examples are 
provided to illustrate the effectiveness of our algorithms. We are currently applying these tools for 
leader selection problems in different types of networks, including small-world social networks. 

Appendix 

A. Equivalence between leader selection and sensor selection problems 

We next show that the problem of choosing Ni absolute position measurements among n 



sensors to minimize the variance of the estimation error in Section II-B is equivalent to the 



noise-corrupted leader selection problem (LSI) 



Given the measurement vector y in ([5]), the linear minimum variance unbiased estimate of tp 
is determined by [41, Chapter 4.4] 

i> = (ErW-'Ej + E a {ElW a E a )- l E T a )-\E T W; l y r + E a {E T a W a E a )- 1 y a ) 

with the covariance of the estimation error 

S = S^-^-^f) = {E r W; x E T r + E a {E T a W a E a y l E T a y\ 

Furthermore, let us assume that 

W r = /, W a = D-\ 

The choice of W a indicates that a larger value of k« corresponds to a more accurate absolute 
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measurement of sensor i. Then 



(E^WaEa)- 1 = {E T a D- l E a y l = EjD K E a 



and thus, 



E = (E r E r + E a E a D K E a E a 



T\-l 



Since E a E^ is a diagonal matrix with its zth diagonal element being 1 for i e X a and E r Ej is 
the Laplacian matrix of the relative measurement graph, it follows that 



D x = E a Ej, L = E r Ej, £ = (L + D X D K D, 



1-1 



(L + D K D 3 



i-i 



where D X D K D X = D K D X because D x and _D K commute and D X D X = D x . Therefore, we have 



established equivalence between the noise-corrupted leader selection problem (LSI) and the 
problem of choosing Ni sensors with absolute position measurements such that the variance of 
the estimation error is minimized. 

To formulate an estimation problem that is equivalent to the noise-free leader selection prob- 



lem (LS2) we follow [10] and assume that the positions of Ni sensors are known a priori. Let 
ipi denote the positions of these reference sensors and let ipf denote the positions of the other 
sensors. We can thus write the relative measurement equation Q as 

y r = Ejip + w r = Efipi + Ejijjf + w r 

and the linear minimum variance unbiased estimate of ipf is given by 

{p f = (EfEjy'EfW- 1 (y r - Ej^) 

with covariance of the estimation error 



(E f E 



jr 1 - 



Identifying EfEj with Lf in the Laplacian matrix 



E r E r 



E\Ej E]E T t 



EfEf EfE^f 



<4*f 

! f Ej 



Li Lr : 



'0 
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establishes equivalence between problem |(LS2)| and the problem of assigning iVj sensors with 



known reference positions to minimize the variance of the estimation error of sensor network. 



B. ADMM for the soft constraint method 



We next employ ADMM for the soft constraint method developed in Section III-B We consider 
the following minimization problem 

minimize f(x) + 7 H^H^ 

X 

where / is the convex approximation of ([6]) 

f(x) = trace ((J - D X ){G + D x oL)-\l - D x )) 



and G is the linear approximation of G given by (10). This problem is equivalent to the 
constrained problem 

minimize f(x) + 7 \\z\\e 1 

X, z 

subject to x — z = 
and the associated augmented Lagrangian function is given by 

£ p (x,z,\) = f(x) + 7lMk + A T (x - z) + ^\\x - z\\l. 



By completion of squares in C p with respect to z, the z-minimization problem (18b) can be 
expressed as 

• • • 11 11 P 11 k\\2 

minimize 7 \\z U + - \\z — v 2 

z 2 

where v k = x k+1 + (1 / p) \ k . The solution is given by the soft thresholding operator (e.g., see [17, 
Section 4.4.3]) 

1 " rrr J v t \ v f\ > i/p 



,■(>■';) { V k*l/ " (2i) 

0, \vf\ < 7/p 

for i — 1, . . . , n. On the other hand, by completing squares in £ p with respect to x, we obtain 

minimize 4>(x) = f(x) + - ||x — || | 
x 2 
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where u k = z k — (1/ p)\ k . This problem can be solved using descent methods (e.g., gradient 
method [32]). Here, we provide the expression for the gradient of <p 

V0(x) = - 2 diag ((/ - D x )M~ l ) + diag (L(I - D Xo )M~\I - D x ) 2 M~ l ) 
- diag (AT 1 (I - D X ) 2 M~ X ) o diag (L) + p(x - u k ) 

where M = Gq + D x o L. 



C. Customized interior point method for \(CR) 



We begin by augmenting the objective function in (CR) with log-barrier functions associated 
with the inequality constraints on Xi 

n 

minimize q(x) = rtrace((L + D K D x )~ l ) + N (— log(xj) — log(l — xi)) 

fri (22) 

subject to t T x = N t . 

The solution of the approximate problems ( [22] ) converges to the solution of the convex relax- 
ation [(CR)] as the positive scalar r increases to infinity [32, Section 11.2]. We solve a sequence 
of problem ([22]) by gradually increasing r, and by starting each minimization using the solution 
from the previous value of r. We use Newton's method to solve d22b for a fixed r, and the 



Newton direction for problems with linear constraints is given by (e.g., see [32, Section 10.2]) 

Xnt = - (V 2 g)^Vg - S^q)- 1 ! 

where 

l T (V 2 g)- 1 Vg 

t T (v 2 q)-n ' 

Here, the expressions for the ith entry of the gradient direction Vg and for the Hessian matrix 
are given by 

{Vq)i = - t Ki ((L + D K D x )~ 2 )n - x; 1 - (Xi-iy 1 
V 2 g = 2r (D K (L + D K D X )~ 2 D K ) o (L + D^y 1 + diag {xf + (1 - Xi y 2 ) . 

We next examine complexity of computing the Newton direction x nt . The major cost of 
computing V 2 g is to form (L + D K D X )~ 2 , which takes (7/3)n 3 operations to form (L+ D^Dx)' 1 
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and n 3 operations to form (L + D K D X ) 2 . Computing x nt requires solving two linear equations, 

(V 2 q)y = -Vq, {V 2 q)z = -1 

which takes (l/3)n 3 operations using Cholesky factorization. Thus, computation of each Newton 
step requires (7/3 + 1 + l/3)n 3 = (ll/3)n 3 operations. 



D. Derivation of (20) 



We use completion of squares to obtain the following problem which is equivalent to (18b) 

minimize (p/2)\\z — v k \\ 2 , 

z 

subject to z E C 



where v k = x k+1 + (l/p)X k and the set C is given by (15). Projecting v onto C yields 



1 if v k > [v k } Nl 

J ' (23) 
if v k < [v k ] Nl 

where [v k ] Nl is the (iVj)th largest entry of v k . To see this, consider z E C, i.e., t T z = N t and 



Zi E {0, 1}, but z is not the projection determined by (23). Thus, there exists at least one entry 
of z, say the rth entry, such that z r — 1 for v k < [v k ]N r and at least one entry, say the jth entry, 
such that Zj = for v k > [v k ]N r Consider 



6 rj = (Z r ~ V k f + ( Zj - V k f = (1 - V k f + (V k " 2 

and 5 jr = [v k ) 2 + (1 - v k ) 2 . Since 5 rj - 5 jr = 2{v k - v k ) > 0, it follows that the objective 
function (p/2) \\z — v k \\l will decrease if we choose {z r = 0, Zj = 1} instead of {z r = 1, Zj = 0}. 
Therefore, we can reduce the objective function by exchanging the values of two entries z r — 1 



(with v k < [v fc ]jva) and Zj = (with v k > [v ]n { ) until (23) is satisfied for all i = 1, . . . , n. 
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